1. AMAP, Univ. Montpellier, CIRAD, CNRS, INRAE, IRD, Montpellier, France
  2. Institute of Biology, Karl‐Franzens University of Graz, Graz, Austria
  3. Nicholas School of the Environment, Duke University, Durham (NC), USA
  4. Univ. Grenoble Alpes, INRAE, LESSEM, F-38402 St‐Martin‐d’Hères, France
  5. Eco&Sols, Univ. Montpellier, CIRAD, INRAE, IRD, Institut Agro, Montpellier, France
  6. Department of Ecology, French Institute of Pondicherry, Puducherry, India
  7. Department of Economics, University of Leipzig, Leipzig, Germany
  8. German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
  9. Smithsonian Tropical Research Institute, Balboa, Panama

1 Body of evidence

We used literature and data analyses of various nature to support our opinions.

The body of evidence used to support the opinion of the paper

Figure 1.1: The body of evidence used to support the opinion of the paper

Figure 1.1 illustrates how these elements articulate together. Following are the details of the three data analyses cited in the figure.

2 Analysis of tropical forest inventory data

In this section, we aim at exploring the intraspecific variability of tree growth in three natural hyperdiverse tropical forests. First, we estimate the intraspecific variance of annualised growth in the three datasets using a hierarchical bayesian model. Then, we test if we can detect spatial structure in individual mean growth. Finally, we test if intraspecific variability is higher than interspecific variability.

2.1 Datasets

2.1.1 Paracou

The Paracou forest is located in French Guiana. It is one of the best-studied lowland tropical forests in the Guiana Shield region. It belongs to the Caesalpiniaceae facies and has amongst the highest alpha-diversity in the Guiana shield with 150-200 tree species per hectare in inventories of trees with diameter at breast height (DBH) \(\ge\) 10 cm . The studied 93,75 ha area harbours 768 species, 254 genera and 65 families of plants. The Guiana Shield is characterized by Pre-Cambrian granitic and metamorphic geological formations, highly eroded. It is associated with gently undulating landscapes and a very dense hydrographic system. Paracou forest lies in a hilly area, on a formation called “série Armina” characterized by schists and sandstones and locally crossed by veins of pegmatite, aplite and quartz. The topography of the site consists of small hills separated by narrow ( < 5 m wide) sandy waterbeds. The altitude varies from 5 to about 45 m above sea level [2]. The mean annual temperature is 26 °C and winds are generally weak. There is a well marked dry season (from mid-August to mid-November) and a long rain season with a short drier period between March and April (mean annual rainfall of 3,041 mm). Different study programs have been led at the Paracou site, which is managed by the CIRAD but is open to the scientific community. Here, we used data from a sylvicultural experiment called the “disturbance experiment” where 15 plots of 6,25 ha were exposed to four different logging conditions between 1986 and 1988. Since then, cartesian coordinates, DBH, species identity and survival of each tree with a DBH \(\ge\) 10 cm have been collected every one or two years, during the dry season (mid-August to mid-November). The final dataset contains ca. 46000 tree measurements.

2.1.2 Uppangala

The Uppangala Permanent Sample Plot (UPSP) is located in South-East Asia, in the Western Ghats of India, established in 1989 by the French Institute of Pondicherry in the Kadamakal Reserve Forest, in the Pushpagiri Wildlife Sanctuary, in Karnataka state, India [3]. It is a low altitude (500-600 m) wet evergreen monsoon Dipterocarp forest [4]. The studied area, of 5.07 ha, is quite steep, with a mean slope angle of about 30–35°. The plots are five north–south oriented transects, 20 m wide, 180 to 370 m long, and 100 m apart center to center and three rectangular plots which overlap the transects. The transects gather data from 1990 to 2011 and the rectangular plots from 1993 to 2011. The trees with GBH (Girth at Breast Height) \(\ge\) 30 cm (equivalent to ca. 9.5 cm DBH) were measured every 3 to 5 years. The final dataset contains measurements of 3870 trees and 102 species (including 2 morphospecies). This forest is considered as one of the rare undisturbed tropical forests in the world [5].

2.1.3 BCI

The Barro Colorado Island site is located in central America, in Panama, covered by a lowland tropical moist forest. The zone became an island after a valley was flooded in order to build the Panama Canal, in 1913. It is nowadays considered as the most intensively studied tropical forest in the world. The studied site is a 50 ha plot (500 \(\times\) 1000 m). It has an elevation of 120 m and is quite flat (most slopes are gentler than 10°). Complete censuses of all trees with DBH \(\ge\) 1 cm have been performed every 5 years since 1980. The dataset contains measurements of 328 tree species and 423617 trees, and when only taking into account trees with DBH \(\ge\) 10 cm, it contains measurements of 255 species and 37224 trees. The dataset is available at [6].

2.2 Data preparation

2.2.1 Paracou

We first charge the original data (plots 1 to 15).

Plots at the Paracou site. Each point is a tree.

Figure 2.1: Plots at the Paracou site. Each point is a tree.

Figure 2.1 shows the location of the trees in Paracou.

We then compute annualised growth between two censuses and rename all indeterminate species with the same name. Growth is computed as the difference of DBH (Diameter at Breast Height) between two consecutive censuses, divided by the time period between those two censuses.

There are 615 species, 78434 trees and 1448802 observations in this dataset.

Then we remove the observations where growth < -2 mm/an or > 100 mm/an and remove the years before the perturbations were performed and the biodiversity plots were added (1985-1991).

There are 615 species and 78434 and 1053241 trees at this step.

We finally compute mean annual growth. Mean growth is computed as the difference of DBH between the first and the last time a tree was measured, divided by the period between those two measures.

2.2.2 Uppangala

We first charge the original data.

Plots at Uppangala site. Each point is a tree.

Figure 2.2: Plots at Uppangala site. Each point is a tree.

Figure 2.2 shows the location of the trees in Uppangala.

There are 102 species, 3870 and 69459 observations in this dataset.

We then remove the census dates which were not common for all plots.

We then compute annualised growth and remove the lines with a growth higher than 100 mm/y or inferior to -2 mm/y.

Thus there are 102 species, 3725 trees and 57921 observations after annual growth computation.

We finally compute mean annual growth.

2.2.3 BCI

We first load the original data.

There are 328 species, 423617 trees and 3388936 observations in this dataset.

We then remove the lines with missing DBH measures and trees with DBH < 10 cm for consistence with the other datasets.

The 50 ha plot of the Barro Colorado Island site. Each point is a tree.

Figure 2.3: The 50 ha plot of the Barro Colorado Island site. Each point is a tree.

Figure 2.3 shows the location of the selected trees in BCI.

We then compute annualised growth and remove lines with a growth higher than 100 mm/y or inferior to -2 mm/y.

There are now 244 species and 30386 trees and 167618 observations.

We finally compute mean annual growth.

2.3 Abundance diagrams

Abundance diagrams of each siteAbundance diagrams of each siteAbundance diagrams of each siteAbundance diagrams of each site

Figure 2.4: Abundance diagrams of each site

These diagrams (Figure 2.4) can help to visualise how diverse the tropical forest is, but also that there are few dominant species and many rare species (long threshold).

2.4 Estimation of IV

We quantified the importance of intraspecific variability compared to interspecific variability. To do so, we used Stan to run a Bayesian hierarchical model, with a species random effect and an individual random effect on the intercept. In order to compare to classic growth models, and estimate an intraspecific variability within diameter classes, we also added an intercept and a diameter fixed effect. We used the brms package [7,8] in Rstudio to implement the model. We scaled the data to improve convergence time.

\(ln(G_{ijt} + 2) = [\beta_0 + b_{0j} + d_{0i}] + \beta_1 \times ln(D_{ijt}) + \epsilon_{ijt},\)

Priors

\(\beta_0 \sim \mathcal{N}(mean=0, var = 1), iid\)

\(\beta_1 \sim \mathcal{N}(mean=0, var = 1), iid\)

\(b_{0j} \sim \mathcal{N}(mean=0, var=V_b), iid\)

\(d_{0j} \sim \mathcal{N}(mean=0, var=V_d), iid\)

\(\epsilon_{ijt} \sim \mathcal{N}(mean=0, var=V), iid\)

Hyperpriors

\(V_b \sim \mathcal{IG}(shape = 10^{-3}, rate = 10^{-3}), iid\)

\(V_d \sim \mathcal{IG}(shape = 10^{-3}, rate = 10^{-3}), iid\)

\(V \sim \mathcal{IG}(shape = 10^{-3}, rate = 10^{-3}), iid\)

Where \(i\) is the individual and \(j\) the species. The model includes two fixed effects (\(\beta_0\) and \(\beta_1\)), a random species effect \(b_{0j}\) on the intercept and a random individual effect \(d_{0i}\) on the intercept. \(V_d\) is the intraspecific variance and \(V_b\) is the interspecific variance.

2.4.1 Paracou

Trace of the posterior estimates for Paracou

Figure 2.5: Trace of the posterior estimates for Paracou

Density of the posterior estimates for Paracou

Figure 2.6: Density of the posterior estimates for Paracou

Here the intraspecific variability is about the same as interspecific variability.

2.4.2 Uppangala

We apply the same model on the Uppangala dataset.

Trace of the posterior estimates for Uppangala

Figure 2.7: Trace of the posterior estimates for Uppangala

Density of the posterior estimates for Uppangala

Figure 2.8: Density of the posterior estimates for Uppangala

We can also conclude that intraspecific variability is high, but even that it is higher than interspecific variability.

2.4.3 BCI

We apply the same model to the BCI dataset.

Trace of the posterior estimates for BCI

Figure 2.9: Trace of the posterior estimates for BCI

Density of the posterior estimates for BCI

Figure 2.10: Density of the posterior estimates for BCI

Here, intraspecific variability is slower than interspecific variability, although it is still about half the order of magnitude, which is significant. Note that here the mean estimated fixed effect of diameter on growth is slightly negative, which is counter-intuitive.

Table 2.1: Mean estimates and their estimation errors for the three models
Intercept (\(\beta_0\)) Diameter (\(\beta_1\)) Species variance (\(V_b\)) Individual variance (\(V_d\)) Residual variance (\(V\))
Paracou
Estimate 6.4e-02 3.3e-02 4.7e-01 4.6e-01 7.7e-01
Estimation error 2.4e-02 4e-03 1.7e-02 4.1e-03 2.3e-03
Uppangala
Estimate 9.5e-02 1.9e-01 3.8e-01 6.6e-01 5.9e-01
Estimation error 4.7e-02 1.2e-02 4.4e-02 8.6e-03 1.9e-03
BCI
Estimate 1.8e-01 -2.2e-02 6.7e-01 4.1e-01 8.1e-01
Estimation error 4.3e-02 4.6e-03 3.7e-02 4.1e-03 2e-03

Table 2.1 enables to compare the results of the models for the three datasets.

We compute the proportion of unexplained variance of growth that is contained in the individual effects.

Proportions of each variance components for the three datasets. The variance of individual effects is in blue, the variance of species effect is in purple and the variance of residuals is yellowProportions of each variance components for the three datasets. The variance of individual effects is in blue, the variance of species effect is in purple and the variance of residuals is yellowProportions of each variance components for the three datasets. The variance of individual effects is in blue, the variance of species effect is in purple and the variance of residuals is yellow

Figure 2.11: Proportions of each variance components for the three datasets. The variance of individual effects is in blue, the variance of species effect is in purple and the variance of residuals is yellow

Overall, a large part of variability is imputable to individual effects, showing a high intraspecific variability in growth in tropical forests, even when taking the effect of diameter on growth into account in the model (Figure 2.11).

2.5 Moran’s I analysis

In order to obtain a numeric evaluation of spatial autocorrelation, we performed a Moran’s test on individual mean growth for each species. Moran’s test function was re-written from the ape package’s Moran.I function with a one-tailed test [9] in order to be able to select the pairs of neighbours which are less than 100 m apart and thus assess local spatial autocorrelation.

We computed the proportion of species with significant positive spatial autocorrelation with a 0.05 alpha risk and the proportion of species with non significant spatial autocorrelation. The proportions are computed relative to the species on which the test was computed.

Table 2.2: Proportions of species and individuals with a significant Moran’s I test result
Significant (i) Not significant (ii)
Paracou
Proportion of species 28.7 71.3
Proportion of individuals 77.7 22.3
Uppangala
Proportion of species 18.5 81.5
Proportion of individuals 45.3 54.7
BCI
Proportion of species 16.9 83.1
Proportion of individuals 51.4 48.6

We observe that the proportion of species with significant spatial autocorrelation is low (29% in Paracou, 18% for Uppangala, 17% for BCI, Table 2.2). However, the average number of individuals represented in each category of species (with significant or not significant spatial autocorrelation) shows that there is a much higher number of individuals in species with significant spatial autocorrelation. This motivates to analyse the proportion of individuals rather than the number of species. We thus test if species abundance significantly explains the significance of Moran’s test and find that it is largely significant.

For all three datasets, the effect of the number of individuals on test significance is significant. However, it is less significant in Uppangala. For Paracou, the proportion of individuals belonging to a species which had a significant spatial autocorrelation is high (> 77%). For Uppangala and BCI, it is lower (~45% and 51%, respectively).

Important features contrasting between those forests are the topography, which is hillier in Uppangala, the number of species and of individuals per species, which are higher in Paracou, and the date of the last known disturbance, which is much more recent in Paracou than in BCI and Uppangala. Moreover, the same analysis of the undisturbed plots of Paracou give about the same proportion of individuals (exposed lower).

Finally, the magnitude of spatial autocorrelation varied across the three sites, with Paracou presenting the biggest proportion and Uppangala the lowest, which can be related to each site specificities. The plots of the Paracou site were subjected to disturbance treatments that created numerous large gaps [10]. This resulted in a high heterogeneity and spatial structuration of the environment. When restricting our analysis to undisturbed control plots, the proportion of detected spatial autocorrelation in conspecific growth drops from 77.1% to about 50% individuals, supporting the hypothesis that the more important spatial autocorrelation in Paracou is due to the stronger spatial structuration of the micro-environment associated with disturbance treatments. Conversely, the disturbance regime in BCI is known to produce frequent small gaps [11], while the mature forest in Uppangala has been reported as barely disturbed [3]. We acknowledge that the hilly topography of Uppangala could make our estimates of distances among individuals based on pairs of tree coordinates provided by field inventory less accurate than in the other two flatter sites.

2.6 Is variance within conspecifics inferior to variance within heterospecifics ?

In order to compare intra- and interspecific variance, we computed the semivariance of growth of individuals that are less than 100 m apart. We compared the semivariances obtained with conspecifics and heterospecifics with a Mann-Whitney test. Semivariance is estimated as the mean of the squared difference of the mean growth of all pairs individuals of the focal species which are less than 100 m apart. Interspecific variance is estimated with the semivariance computed as the mean of the squared difference of the mean growth of all pairs individuals of the containing the focal species and another species which are less than 100 m apart.

Table 2.3: Proportions of species and individuals with a significant semivariance comparison test result
Intraspecific variability < interspecific variability (i) Intraspecific variability ~ interspecific variability (ii) Intraspecific variability > interspecific variability (iii)
Paracou
Proportion of species 61.7 39.7 0.333
Proportion of individuals 89.1 10.9 0.0699
Uppangala
Proportion of species 42.2 62.2 4.44
Proportion of individuals 57.7 23.6 18.8
BCI
Proportion of species 46.6 44.3 9.09
Proportion of individuals 75.2 17.2 7.6

Computing interspecific variance with a sampling of maximum ten individuals per species, we obtained a theoretical interspecific variance, which does not take the actual proportions of species into account.

Table 2.4: Proportions of species and individuals with a significant semivariance comparison test result, using samples to estimate interspecific variance.
Intraspecific variability < interspecific variability (i) Intraspecific variability ~ interspecific variability (ii) Intraspecific variability > interspecific variability (iii)
Paracou
Proportion of species 61.7 37 1.33
Proportion of individuals 86.3 12.1 1.56
Uppangala
Proportion of species 42.2 55.6 2.22
Proportion of individuals 61.2 20.4 18.4
BCI
Proportion of species 50 47.7 2.27
Proportion of individuals 71.2 27.8 1.02

In all forests, in about half of the species, local intraspecific growth semivariance is significantly smaller than local semivariance of heterospecifics growth. In all forests, a large majority of individuals belong to species in which intraspecific growth semivariance is significantly smaller than interspecific growth semivariance (Table 2.3). Sampling individuals within species to estimate interspecific variance did not change the main results (Table 2.4).

2.7 Other tests and verifications

2.7.1 Undisturbed plots in Paracou

One of our main hypotheses is that many environmental variables are spatially structured. Consequently, as tree growth is largely influenced by environmental variables, it should also be spatially structured. Our analysis using Moran’s I test showed that growth was indeed spatially structured. The Paracou dataset offers the opportunity to test this hypothesis further. Indeed, some plots were disturbed in the early eighties, creating artificial gaps, whereas others were not disturbed. As the creation of gaps results in a strong spatial structure of the available light under the canopy, growth should be less structured in plots that were not disturbed. That is what we test here.

This leads to a dataset containing 14266 trees, 451 species and 14266 observations.

Table 2.5: Proportions of species and individuals of the Paracou site which had a significant or insignificant Moran’s test results
Significant (i) Not significant (ii)
Proportion of species 18 82
Proportion of individuals 54.7 45.3

The proportion of individuals represented in the species which have a significant Moran’s test are about 50%. With the disturbed plots, it was more than 70% (Table 2.5). This is due to the openness of the canopy triggering important spatial structure in disturbed plots due to light gaps. In these gaps, trees tend to grow faster, and therefore the spatial structure of growth is stronger.

2.7.2 Smaller stems in BCI

In order to be able to compare all three datasets together, we removed all stems that had a DBH inferior to 10 cm in BCI, where all stems with DBH \(\ge\) 1 cm are measured. Using the complete dataset, we concluded that the spatial structure in individual growth was even more significant.

2.7.3 Other discussion of the results

We here used data on tree growth, which is one component of individual fitness and an imperfect proxy of tree competitive ability, and our analyses could be further tested with other traits in the future.

3 Analysis of an Eucalyptus clonal plantation dataset

In this analysis we test whether intraspecific variability (IV) in observed individual tree growth can emerge from the environment. To this aim, we use a clonal experimental setup. The EUCFLUX common garden is a clonal trial where 14 Eucalyptus clones are grown in a replicated, statistically-sound design. One of its main goals is to determine and compare the productivity of each clone. Our hypothesis on growth IV is that it mainly results from responses to environmental factors, and not only from intrinsic (here more precisely genetic) factors. Therefore, we use the EUCFLUX dataset in order to quantify IV of growth within clones, i.e. in a dataset where genetically-driven IV is nil. Following our hypothesis, we expect to detect IV in growth within single genotypes, although we also expect the genotype to influence the growth response.

3.1 The dataset

The EUCFLUX experiment is located in Brazil, in the state of São Paulo. It includes 14 clones of 5 different species or hybrids, planted at the same date and grown in 10 replicated blocks of 100 trees each, which were monitored during 6 years. Each replicated block is distant by up to 1.5 km in a 200 ha stand showing some small changes in soil properties. The details of the setup can be found in [12]. We used the DBH measured during 5 complete censuses in order to compute annualised growth in mm/year. The raw data of the experiment was manually rearranged into six files with LibreOffice Calc, each file corresponding to a complete census. We computed annualised growth of each tree in mm/y as the difference between two consecutive censuses divided by the time between the two censuses. In case of mortality of the tree between two censuses, the data was discarded. We computed the neperian logarithm of diameter and of growth (with a constant for growth in order to avoid undefined values).

The final dataset included 64125 DBH measurements corresponding to 13531 trees.

The original data without negative growth values (A) and the log-transformed growth (B).

Figure 3.1: The original data without negative growth values (A) and the log-transformed growth (B).

Figure 3.1 shows the distribution of growth after removing negative values, and the latter with log-transformed values.

Design of a plot. Each point is a tree and the associated number is the tag of the tree.

Figure 3.2: Design of a plot. Each point is a tree and the associated number is the tag of the tree.

Figure 3.2 shows the disposition of the trees in a single plot. There are 14 clones times 10 repetitions, so 140 plots with this same design.

3.2 Competition index

Plot of the growth versus the diameter. Each colour represents a tree age.

Figure 3.3: Plot of the growth versus the diameter. Each colour represents a tree age.

Figure 3.3 shows the age of the trees has a big influence on the values of growth but also on the relationship between growth and diameter : the slope is smaller with time, indicating that for the same diameter, growth is slower through time. This is likely an effect of competition for light and possibly underground resources, since as the trees grow their capacity to capture resources increases. Therefore, we computed a competition index to integrate this effect in the growth model. The competition index is computed for each tree which is not on the edge of a plot. It is the sum of the basal areas of the 8 direct neighbours divided by the area of the rectangle that comprises all the direct neighbours. It was then log-transformed.

\(C_{i, t} = \sum BA_{neighbours(i, t)}\)

3.3 Statistical growth model

In order to partition the variance of individual growth data, we built a hierarchical Bayesian model and used Stan and the package brms [7,8] in Rstudio to implement it in R. We used the following parameters : n.adapt = 1000 ; n.burn = 1000 ; n.iter = 5000 ; n.thin = 5. Our model incorporated a fixed effect on the intercept (\(\beta_0\)), on the slope of diameter D (\(\beta_1\)),and on the competition index C (\(\beta_2\)) and several random effects, namely temporal (date of census, \(b_t\)), individual (tree identifier, \(b_i\)), spatial (block, \(b_b\)), and genotype (\(b_g\)).

Variables are scaled to help the convergence of the model.

\(ln(G_{it}+1) = (\beta_0 + b_i + b_b + b_g + b_t) + \beta_1 \times ln(D_{it}) + \beta_2 \times ln(C_{it}) + \epsilon_{it}\)

Priors

\(\beta_0 \sim \mathcal{N}(mean=0, var = 1), iid\)

\(\beta_1 \sim \mathcal{N}(mean=0, var = 1), iid\)

\(\beta_2 \sim \mathcal{N}(mean=0, var = 1), iid\)

\(b_i \sim \mathcal{N}(mean=0, var=V_i), iid\)

\(b_b \sim \mathcal{N}(mean=0, var=V_b), iid\)

\(b_g \sim \mathcal{N}(mean=0, var=V_g), iid\)

\(b_t \sim \mathcal{N}(mean=0, var=V_t), iid\)

\(\epsilon_{it} \sim \mathcal{N}(mean=0, var=V), iid\)

Hyperpriors

\(V_i \sim \mathcal{IG}(shape = 10^{-3}, rate = 10^{-3}), iid\)

3.4 Results of the model and variance partitioning

After convergence of the model (checked visually), we examined the variance of each random effect, and this enabled us to perform a variance partitioning.

Trace of the posteriors of the inferred parameters

Figure 3.4: Trace of the posteriors of the inferred parameters

Density of the posteriors of the inferred parameters

Figure 3.5: Density of the posteriors of the inferred parameters

Trace of the temporal random effects

Figure 3.6: Trace of the temporal random effects

Trace of the genotype random effects

Figure 3.7: Trace of the genotype random effects

Trace of the spatial (block) random effects.

Figure 3.8: Trace of the spatial (block) random effects.

Mean values and 95% confidence interval of the temporal, genetic and spatial and random effects.

Figure 3.9: Mean values and 95% confidence interval of the temporal, genetic and spatial and random effects.

Table 3.1: Mean posteriors of the model and their estimation errors.
Intercept (\(\beta_0\)) Diameter (\(\beta_1\)) Competition (\(\beta_2\)) Individual variance (\(V_i\)) Block variance (\(V_b\)) Genetic variance (\(V_g\)) Temporal variance (\(V_t\)) Residuals variance (\(V\))
Estimate -2.5e-02 5.5e-01 -2.7e-01 2.3e-01 5.5e-02 1.3e-01 1.2e+00 5.1e-01
Estimation error 4.6e-01 5e-03 9e-03 4.1e-03 1.6e-02 2.7e-02 5.5e-01 2e-03

We found that the two most important contributors to variance were the date and individual identity. High estimation error for the intercept and the temporal random effect must be noted. The proportion of variance represented by each random effect and the residual variance are computed to visualise the variance partition.

Proportion of each variance component of the unexplained variance.

Figure 3.10: Proportion of each variance component of the unexplained variance.

The model showed that individual tree growth was a function of tree size and competition with neighbouring trees, and that variance around this model was mostly due to a temporal effect as well as an individual effect (Table 3.1, Figure 3.2). The effect of the genotype was quite small, and the effect of block was even smaller. The temporal random effects declined with the date (Figure 3.9, panel A), showing that the effect of the date on growth is negative (the older the trees become, the less they can grow). We attribute this tendency to competition. Therefore, the competition index C did not fully capture the effect of competition on growth. Another explanation is that the diameter slope was not able to fully capture the decrease of tree growth with size, maybe due to geometrical effects of distributing growth around increasing diameter or physiological constraints linked to height. The temporal effect explained the highest fraction of variance (3.1, 3.2). This is due to the negative effect of competition for light, water, and/or nutrients on growth, which increases with the growth of the trees planted at high densities, and to physiological changes occurring with age. Importantly, variability between individuals is much higher than between genotypes. This shows that there is individual variability even if the trees are clones and, therefore, that individual variability is only caused by exogenous factors. This individual effect can be due to the micro-environment where the tree is positioned, but also to some individual history, such as seedling manipulation and plantation. The genotype explained only 6.1% of the variance. Therefore the impact of individual identity on growth is stronger than the effect of genotype. The block had the littlest impact with 2.5% of the variance explained. This means that the physical environment between blocks is quite homogeneous, or that the physical environment does not play a big role in growth. As the experimental design aimed at minimizing environmental variations and selected productive genotypes able to accommodate several environmental conditions [12], this dataset is a strongly conservative test case for our hypothesis.

3.5 Conclusion

Overall, we found that there is IV within clonal tree plantations. This shows that the environmental factors (in a broad sense : not genetic) have a strong impact on growth and that IV can indeed emerge within a clone.

4 Virtual experiment with two species

Intraspecific variability (IV) is often seen as an unstructured, intrinsic property of individuals, mostly genetic. Here, we investigate whether it can result from the species responses to the spatial variations of the environment. Our hypothesis is that the individuals can be clones, i.e. have no genetic differences between them, and still have different measured attributes because the environment in which they strive varies, as shown in the previous analysis of a clonal dataset. The response of an individual results from the integrated effects of the environmental conditions it has encountered during its life (local environmental variation, or microsite effect) and its genetic features. Moreover, the differences between individuals due to environmental variation does not imply a broader overlap of the species niches. Therefore, this type of IV has no effect on species coexistence directly. We designed and conducted a virtual experiment that aims at illustrating that IV, or “individual effects,” can result from unobserved variation in some environmental variables [13], therefore accounting for multidimensional species differences which cannot be observed on a few niche axes.

4.1 Perfect knowledge model : generating response values.

To do so, we first considered a model that depicts the response of a process, e.g. growth, for individual clones of two different species to all the environmental variables that influence it. Henceforth denoted the “Perfect knowledge model,” it represents the perfect knowledge of the process as it occurs in the field and thus includes no residuals :

\(Y_{ijt} = \beta_{0j} + \beta_{1j} * ln(X_{1ijt}) + \beta_{2j} * X_{2ijt} + \ldots + \beta_{Nj} * X_{Nijt}\) “Perfect knowledge model”

where \(1 \leq i \leq 500\) are the individuals ; \(1 \leq j \leq 2\) are the species ; \(Y\) is a response variable, for instance growth ; \(X1\) to \(XN\) are explanatory variables, for instance environmental variables, that can vary with time \(t\) ; and the \(\beta_{k,i}\) , \(1 \leq k \leq N\) depict the species-specific responses to these environmental variables. As we consider clones, individuals of the same species respond the same way to environmental variables, and variation in \(Y\) among individuals within species is due to differences in the environment where each individual thrives. The first explanatory variable was natural log-transformed to represent a log-log relationship as would be the relationship between tree growth and light.

We fix the species parameters of the “Perfect knowledge model” as follows, using 10 environmental variables (N=10) :

Parameters of species 1 : \(\beta_0\) = 0.25, \(\beta_1\) = 0.15, \(\beta_2\) to \(\beta_{6}\) are chosen randomly between -0.05 and 0.05 and \(\beta_{7}\) to \(\beta_{10}\) are chosen randomly between 0 and 0.05.

Parameters of species 2 : \(\beta_0\) = 0.2, \(\beta_1\) = 0.1, \(\beta_2\) to \(\beta_{10}\) are chosen the same way as for species 1.

The difference between the species is imposed by those parameters. Here, species 1 is more competitive on average thanks to its higher intercept (\(\beta_0\)) and reaction to the first environmental variable (\(\beta_1\)). The first environmental variable (\(X1\)) has a higher weight in the computation of the response variable, as would be the most limiting factor for the response variable.

To account for genetic variability in our generated data, we added an IV in species parameters by sampling each individual parameter in a normal distribution centered on the species mean parameter and with a standard deviation of a quarter of the species mean parameter.

To represent the spatialised environment of such a forest plot, we build a 2D matrix of dimension 500 \(\times\) 500 for each environmental variable at a time \(t_0\), by randomly generating them with spatial autocorrelation. We here considered that all environmental variables were independent. Each variable was simulated by using the gstat R package [14,15], enabling to create autocorrelated random fields. A spherical semivariogram model was used for each of the ten environmental variables, with a mean of 0 for each explanatory variable (beta = 0), a sill of 1 (psill = 1) for each and a range of 50 (range = 50).

We then considered that \(Y\) had been measured at two times, \(t_0\) and \(t_1\), for each individual as it would be under periodic forest plot censuses for instance. A pair of coordinates within this spatialised environment was randomly assigned to each of the 500 individuals. We considered that some of the environmental variables (light, temperature, humidity, nutrient availability for instance) had changed between \(t_0\) and \(t_1\) and others had not (slope, altitude for instance). For the first environmental variable which had the stronger impact on \(Y\) values and another randomly drawn environmental variable, we computed values at \(t_1\) as \(t_0 + \epsilon\), \(\epsilon \sim \mathcal{N}(0, 0.1)\) (they randomly increase or decrease). For two other environmental variables randomly drawn, \(X_{t_1} = X_{t_0} + |\epsilon|, \epsilon \sim \mathcal{N}(0, 0.1)\) (they increase) and for two other environmental variables, \(X_{t_1} = X_{t_0} - |\epsilon|, \epsilon \sim \mathcal{N}(0, 0.1)\) (they decrease).

This led to two repeated measurements of \(Y\) for each individual of each species, i.e. 2000 values of \(Y_{i,j,t}\), with \(i=[1;2]\) ; \(j=[1,50]\) ; \(t=[t_0;t_1]\).

Histogram of raw and log-transformed Y with and without genetic IV

Figure 4.1: Histogram of raw and log-transformed Y with and without genetic IV

Raw and log-transformed Y versus X1 plots with and without genetic IV

Figure 4.2: Raw and log-transformed Y versus X1 plots with and without genetic IV

Figure 4.1 shows the distribution of the data and Figure 4.2 shows the relationship between the response Y and the environmental variable X1.

Virtual landscape representing X1 and the individual with their respective Y value.

Figure 4.3: Virtual landscape representing X1 and the individual with their respective Y value.

Figure 4.3 shows that microsite effects, which are the effects of the local multidimensional environment on the observed variable, can result in local reversals of the competitive hierarchy between species. Indeed, the local outcome of competition can be opposite to the mean hierarchy : at one point of the space-time, an individual of Species 1 can overcome an individual of Species 2, whilst Species 1 is, on average, the fittest of the two species. Consequently, microsite effects foster the coexistence of Species 1 and Species 2.

4.2 Partial knowledge model : interpreting the response values with only one explanatory variable.

These two virtual datasets (with and without genetic variability) were then analysed with a “Partial knowledge model,” which represents the process understanding ecologists can derive using these datasets and from their incomplete characterisation of the environment : only a few variables (here only one, \(X_1\)) are actually measured and taken into account.

\(\ln(Y_{ijt}) = [\beta'_{0j} + b_{0i}] + \beta'_{1j} \ln(X_{1ijt}) + \varepsilon_{ijt}\) “Partial knowledge model”

Priors :

\(\varepsilon_{ijt} \sim \mathcal{N}(0,V_j)\), \(j = [1, 2]\), iid

\(\beta'_{kj}\sim \mathcal{N}_2(0, 1)\), \(k = [0, 1]\), \(j = [1, 2]\), iid

\(b_{0i} \sim \mathcal{N}(0, V_{bj})\), \(i = [1, 500]\), \(j = [1, 2]\), iid

Second level priors :

\(V_j \sim \mathcal{IG}(10^-3, 10^-3)\), \(j = [1, 2]\), iid

\(V_{bj} \sim \mathcal{IG}_2(10^-3, 10^-3)\), \(j = [1, 2]\), iid

This model includes a random individual effect on the intercept (\(b_{0i}\)) allowing to account for any variability among individuals within species regarding this parameter. We ran this model twice, with the datasets generated with and without genetic IV. We then acquired the species parameters and the IV generated with genetic IV and get only IV from the perfect knowledge model. These models were fitted with Bayesian inference thanks to Stan, using the brms package [7,8] in Rstudio, with 100000 iterations, a warming period of 50000 iterations and a thinning of 50, thus we finally obtain 1000 estimates per parameter.

We visualised the convergence and the results of the models thanks to trace and density plots.

Trace of the posterior of the model for Species 1 without genetic IV

Figure 4.4: Trace of the posterior of the model for Species 1 without genetic IV

Density of the posterior of the model for Species 1 without genetic IV

Figure 4.5: Density of the posterior of the model for Species 1 without genetic IV

Trace of the posterior of the model for Species 2 without genetic IV

Figure 4.6: Trace of the posterior of the model for Species 2 without genetic IV

Density of the posterior of the model for Species 2 without genetic IV

Figure 4.7: Density of the posterior of the model for Species 2 without genetic IV

Trace of the posterior of the model for Species 1 with genetic IV

Figure 4.8: Trace of the posterior of the model for Species 1 with genetic IV

Density of the posterior of the model for Species 1 with genetic IV

Figure 4.9: Density of the posterior of the model for Species 1 with genetic IV

Trace of the posteriors of the model for Species 2 with genetic IV

Figure 4.10: Trace of the posteriors of the model for Species 2 with genetic IV

Density of the posteriors of the model for Species 2 with genetic IV

Figure 4.11: Density of the posteriors of the model for Species 2 with genetic IV

## Warning: Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running
## more iterations and/or setting stronger priors.
Table 4.1: Mean posteriors and their estimation error
\(\beta_0\) \(\beta_1\) \(V_b\) \(V\)
Species 1
Estimate 6.4e-02 2.9e-01 9.5e-02 1.9e-03
Estimation error 4.7e-03 2.4e-03 3e-03 6.1e-05
Species 2
Estimate 1.3e-01 1.5e-01 7.6e-02 5.2e-04
Estimation error 3.3e-03 6.3e-04 2.4e-03 1.7e-05

We inferred a high IV even in the absence of genetic IV (4.1). Therefore, observed IV does not necessarily reveal intrinsic (mainly genetic) IV, but can also reveal hidden dimensions of the environment. The mean and quantiles of the results of the models were used to visualise the inferred link between \(Y\) and \(X1\). To do so, we created a sequence of explanatory variable values and computed the response variable with the mean and quantiles of the parameters inferred with the models.

Plot of the real values - points - and estimated mean - bold line - and 95 % confidence interval - thin lines - of Y versus X1. The dotted lines correspond to the 95 % interval due to genetic IV.

Figure 4.12: Plot of the real values - points - and estimated mean - bold line - and 95 % confidence interval - thin lines - of Y versus X1. The dotted lines correspond to the 95 % interval due to genetic IV.

The virtual landscape of X1 values and of Y values and the corresponding estimated relationship between X1 and Y.

Figure 4.13: The virtual landscape of X1 values and of Y values and the corresponding estimated relationship between X1 and Y.

In Figure 4.12 and panel A of Figure 4.13, the solid and bold lines represent the mean rate of the response variable (e.g. growth) of Species 1 (blue) and Species 2 (orange) as computed with the parameters retrieved from the model with or without genetic variability respectively. The plain lines represent the 95% interval of the posteriors from the model without genetic variability and the dotted lines show the 95% confidence interval of the posteriors from the model with genetic variability. Figure 4.12 shows that genetic IV increases the overlap between the response of Species 1 and Species 2.

Figure 4.13 represents the virtual landscape of \(X_1\) and the corresponding individual response and the plot of \(Y\) versus \(X1\) (without genetic variability) next to each other, helping to visualise this virtual experiment.

In the model without genetic IV, the unobserved variation in the environment resulted in an important “individual effect.” The performances of the two species can intersect, which means that the competitive hierarchy among the two species can be locally reversed depending on the micro-environmental variation, offering opportunities for the coexistence of the two species in a variable environment. In that sense, incorporating the individual level in statistical models can help to explain the coexistence of species through a better integration of niche multidimensionality in models.

4.3 Spatial autocorrelation of individual response

We then analysed the spatial structure of individual the response variable. We hypothesised that as environmental variables are spatially autocorrelated, and the response variable is the result of these variables, then the response variable should also be spatially autocorrelated. To test this, we computed Moran’s I test. It is computed with the Moran.I function of the ape R package [9].

Table 4.2: Results of the Morans’s I test for both species separatly and together.
Moran’s I index P-value
Species 1 6.7e-02 0e+00
Species 2 6.9e-02 0e+00
All individuals 4.3e-02 0e+00

Table 4.2 shows that the individual response variable is largely spatially autocorrelated. This is due to the spatial autcorrelation in the environmental variables. In this simple, completely controlled experiment, this is natural. However, in a less controlled environment, detecting spatial autocorrelation in a response variable could be the sign of the spatial structure of the underlying environmental variables although other factors like the genetic structure could lead to spatial autocorrelation in the response variable.

4.4 Similarity between conspecific individuals compared to heterospecific individuals and consequences for species coexistence.

Finally, we used semivariograms to visualise the spatial autocorrelation of the response variable and to test whether the individual growth was more similar within conspecifics than within heterospecifics. The semivariograms were computed and modelled with the variogram and the fit.variogram functions of the gstat R package ([15]) respectively. The variogram models were spherical.

Estimated and empirical semivariograms of Y for each species and both species together.

Figure 4.14: Estimated and empirical semivariograms of Y for each species and both species together.

As the semivariance between individuals of both species was higher than the semivariance between conspecifics (4.14), the individual response variable was more similar between conspecifics than heterospecifics. Considering this response variable as a proxy of fitness, and linking fitness to competition, we argue that in a Lotka-Volterra model, this would translate into \(\alpha_{i,i} > \alpha_{i,j}\), the main condition for stable coexistence. Therefore, stable coexistence is possible even with high intraspecific variability, especially when this variability is not intrinsic but due to environmental heterogeneity that is structured in space.

5 Code implementation

The analyses were conducted using the R language [16] in the Rstudio environment [17]. The tables were made with the kableExtra package [18], the figures with the package ggplot2 [19], and the code uses other packages of the Tidyverse [20] (tidyr [21], dplyr [22], readr [23], lubridate [24], magrittr [25], glue [26]) and other R packages (here [27], bayesplot [28], ggpubr [29], sp [30,31], ggnewscale [32], gstat [15]). Some functions were implemented in C++ thanks to the R packages Rcpp [33,34,35] and RcppArmadillo [36]. The pdf and html documents were produced thanks to the R packages rmarkdown [39], knitr [42] and bookdown [43].

References

1
Gourlet-Fleury, S. et al., eds. (2004) Ecology and management of a neotropical rainforest: Lessons drawn from Paracou, a long-term experimental research site in French Guiana, Elsevier.
2
Hérault, B. and Piponiot, C. (2018) Key drivers of ecosystem recovery after disturbance in a neotropical forest: Long-term lessons from the Paracou experiment, French Guiana. Forest Ecosystems 5, 2
3
Pélissier, R. et al. (2011) Tree demography in an undisturbed Dipterocarp permanent sample plot at Uppangala, Western Ghats of India: Ecological Archives E092-115. Ecology 92, 1376–1376
4
Bec, J.L. et al. (2015) Characterizing tropical tree species growth strategies: Learning from inter-individual variability and scale invariance. PLoS ONE 10,
5
Pascal, J.-P. and Pelissier, R. (1996) Structure and Floristic Composition of a Tropical Evergreen Forest in South-West India. Journal of Tropical Ecology 12, 191–214
6
Condit, R. et al. Complete data from the Barro Colorado 50-ha plot: 423617 trees, 35 years. (2019), Dryad
7
Bürkner, P.-C. (2017) Brms : An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software 80,
8
Bürkner, P.-C. (2018) Advanced Bayesian Multilevel Modeling with the R Package brms. The R Journal 10, 395
9
Paradis, E. and Schliep, K. (2019) Ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528
10
Molino, J.-F. and Sabatier, D. (2001) Tree Diversity in Tropical Rain Forests: A Validation of the Intermediate Disturbance Hypothesis. Science 294, 1702–1704
11
Hubbell, S.P. (1999) Light-Gap Disturbances, Recruitment Limitation, and Tree Diversity in a Neotropical Forest. Science 283, 554–557
12
Maire, G. le et al. (2019) Light absorption, light use efficiency and productivity of 16 contrasted genotypes of several Eucalyptus species along a 6-year rotation in Brazil. Forest Ecology and Management 449, 117443
13
Clark, J.S. et al. (2007) Resolving the biodiversity paradox. Ecology Letters 10, 647–659
14
Pebesma, E.J. (2004) Multivariable geostatistics in S: The gstat package. Computers & Geosciences 30, 683–691
15
Gräler, B. et al. (2016) Spatio-Temporal Interpolation using gstat. The R Journal 8, 204
16
R Core Team (2021) R: A language and environment for statistical computing, R Foundation for Statistical Computing.
17
RStudio Team (2021) RStudio: Integrated development environment for r, RStudio, PBC.
18
Zhu, H. (2021) kableExtra: Construct complex table with ’kable’ and pipe syntax,
19
Wickham, H. (2009) Ggplot2: Elegant graphics for data analysis, Springer.
20
Wickham, H. et al. (2019) Welcome to the Tidyverse. Journal of Open Source Software 4, 1686
21
Wickham, H. (2021) Tidyr: Tidy messy data,
22
Wickham, H. et al. (2021) Dplyr: A grammar of data manipulation,
23
Wickham, H. and Hester, J. (2020) Readr: Read rectangular text data,
24
Grolemund, G. and Wickham, H. (2011) Dates and Times Made Easy with lubridate. Journal of Statistical Software 40,
25
Bache, S.M. and Wickham, H. (2020) Magrittr: A forward-pipe operator for r,
26
Hester, J. (2020) Glue: Interpreted string literals,
27
Müller, K. (2020) Here: A simpler way to find your files,
28
Gabry, J. and Mahr, T. Bayesplot: Plotting for bayesian models. (2021)
29
Kassambara, A. (2020) Ggpubr: ’ggplot2’ based publication ready plots,
30
Pebesma, E.J. and Bivand, R.S. (2005) Classes and methods for spatial data in R. R News 5, 9–13
31
Bivand, R. et al. (2013) Applied spatial data analysis with R, Second edition.Springer.
32
Campitelli, E. (2021) Ggnewscale: Multiple fill and colour scales in ’ggplot2’,
33
Eddelbuettel, D. and François, R. (2011) Rcpp : Seamless R and C++ Integration. Journal of Statistical Software 40,
34
Eddelbuettel, D. (2013) Seamless R and C++ integration with Rcpp, Springer.
35
Eddelbuettel, D. and Balamuta, J.J. (2018) Extending r with C++: A Brief Introduction to Rcpp. The American Statistician 72, 28–36
36
Eddelbuettel, D. and Sanderson, C. (2014) RcppArmadillo: Accelerating R with high-performance C++ linear algebra. Computational Statistics & Data Analysis 71, 1054–1063
37
Allaire, J. et al. Rmarkdown: Dynamic documents for R. (2020)
38
Xie, Y. et al. (2019) R Markdown: The definitive guide, CRC Press, Taylor; Francis Group.
39
Xie, Y. et al. (2020) R markdown cookbook, (1st edn) Taylor; Francis, CRC Press.
40
Xie, Y. (2021) Knitr: A general-purpose package for dynamic report generation in r,
41
Xie, Y. (2015) Dynamic documents with R and Knitr, Second edition.CRC Press/Taylor & Francis.
42
Xie, Y. (2014) Knitr: A Comprehensive Tool for Reproducible Research in R. In Implementing reproducible research (Stodden, V. et al., eds), CRC Press, Taylor & Francis Group
43
Xie, Y. (2017) Bookdown: Authoring books and technical publications with R Markdown, CRC Press.